NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Locality-based transfer learning on compression autoencoder for efficient scientific data lossy compression

https://doi.org/10.1016/j.jnca.2022.103452

Wang, Nan; Liu, Tong; Wang, Jinzhen; Liu, Qing; Alibhai, Shakeel; He, Xubin (September 2022, Journal of Network and Computer Applications)

Full Text Available
Locality-based transfer learning on compression autoencoder for high-performance lossy compression of scientific data

Wang, Nan; Liu, Tong; Wang, Jinzhen; Liu, Qing; Alibhai, Shakeel; He, Xubin (January 2022, Journal of Network and Computer Applications)

Full Text Available
High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data

https://doi.org/10.1109/TBDATA.2021.3066151

Liu, Tong; Wang, Jinzhen; Liu, Qing; Alibhai, Shakeel; Lu, Tao; He, Xubin (January 2021, IEEE Transactions on Big Data)
null (Ed.)
Full Text Available
Exploring Transfer Learning to Reduce Training Overhead of HPC Data in Machine Learning

https://doi.org/10.1109/NAS.2019.8834723

Liu, Tong; Alibhai, Shakeel; Wang, Jinzhen; Liu, Qing; He, Xubin; Wu, Chentao (September 2019, 2019 IEEE International Conference on Networking, Architecture and Storage (NAS))

Nowadays, scientific simulations on high-performance computing (HPC) systems can generate large amounts of data (in the scale of terabytes or petabytes) per run. When this huge amount of HPC data is processed by machine learning applications, the training overhead will be significant. Typically, the training process for a neural network can take several hours to complete, if not longer. When machine learning is applied to HPC scientific data, the training time can take several days or even weeks. Transfer learning, an optimization usually used to save training time or achieve better performance, has potential for reducing this large training overhead. In this paper, we apply transfer learning to a machine learning HPC application. We find that transfer learning can reduce training time without, in most cases, significantly increasing the error. This indicates transfer learning can be very useful for working with HPC datasets in machine learning applications.
more » « less
Full Text Available
Reference-Counter Aware Deduplication in Erasure-Coded Distributed Storage System

https://doi.org/10.1109/NAS.2018.8515697

Liu, Tong; He, Xubin; Alibhai, Shakeel; Wu, Chentao (October 2018, 2018 IEEE International Conference on Networking, Architecture and Storage (NAS))

In modern distributed storage systems, space efficiency and system reliability are two major concerns. As a result, contemporary storage systems often employ data deduplication and erasure coding to reduce the storage overhead and provide fault tolerance, respectively. However, little work has been done to explore the relationship between these two techniques. In this paper, we propose Reference-counter Aware Deduplication (RAD), which employs the features of deduplication into erasure coding to improve garbage collection performance when deletion occurs. RAD wisely encodes the data according to the reference counter, which is provided by the deduplication level and thus reduces the encoding overhead when garbage collection is conducted. Further, since the reference counter also represents the reliability levels of the data chunks, we additionally made some effort to explore the trade-offs between storage overhead and reliability level among different erasure codes. The experiment results show that RAD can effectively improve the GC performance by up to 24.8% and the reliability analysis shows that, with certain data features, RAD can provide both better reliability and better storage efficiency compared to the traditional Round- Robin placement.
more » « less
Full Text Available

Search for: All records